Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

نویسندگان

  • Kirill Dyagilev
  • Shie Mannor
  • Nahum Shimkin
چکیده

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a parameterized family of Markov Decision Processes (MDPs). We further impose here the assumption that set of possible parameters is finite, and consider the discounted return. We propose an on-line algorithm for learning in such parameterized models, dubbed the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the the total mistake bound criterion (also known as the sample complexity of exploration). The algorithm relies on Wald’s Sequential Probability Ratio Test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size of the parameter space, independently of the cardinality of the state and action spaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Reinforcement Learning in Parameterized Models: The Parameter Elimination Approach

We consider reinforcement learning in a parame-terized setup, where the model is known to belongto a compact set of Markov Decision Processes(MDPs), under the discounted return criteria. Wepropose an on-line algorithm for efficient learningin such parameterized models, the ApproximateParameter Elimination (APEL) algorithm. The al-gorithm accepts a dense finite grid i...

متن کامل

Reinforcement Learning in Parameterized Models Reinforcement Learning with Polynomial Learning Rate in Parameterized Models

We consider reinforcement learning in a parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criterion. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistakes. The algorithm relies on Wald’s s...

متن کامل

Active exploration in parameterized reinforcement learning

Online model-free reinforcement learning (RL) methods with continuous actions are playing a prominent role when dealing with real-world applications such as Robotics. However, when confronted to non-stationary environments, these methods crucially rely on an exploration-exploitation trade-off which is rarely dynamically and automatically adjusted to changes in the environment. Here we propose a...

متن کامل

Deep Reinforcement Learning in Parameterized Action Space

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning wi...

متن کامل

Actor-Critic Reinforcement Learning with Energy-Based Policies

We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear var...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008